Grouping and Aggregation
Grouping
The grouping function in preparing data combines all records that have identical values in a particular field, or combination of fields, into a single record.
Grouping serves to reduce the size of the dataset being analyzed. It is therefore important that you are knowledgeable of your data and are able to create groups that will provide meaningful results. Once grouped, aggregate operations can be performed.
Aggregation
Aggregation refers to the mathematical operations where a single value is returned from a list of input values.
Data preparation aggregate operations include:
-
Average: sum of all of the list divided by the number of items in the list
-
Count: number of dataset rows
-
Count Distinct: number of distinct data values. A dataset with a column containing the string values: Group A, Group B and Null will have a distinct count = 3.
-
Maximum: greatest value in the set
-
Minimum: least value in the set
-
Standard Deviation: measure of the spread of data and thus the amount of variation from the mean value. This value is the square root of the variance of the samples.
-
Standard Deviation Population: measure of the spread of data and thus the amount of variation from the mean value. This value is the square root of the variance of the whole population.
-
Sum: the addition of the sequence of numbers within the set
-
Variance: measure of the variation within the data; this value is the unbiased variance of samples, calculated using the unbiased number of data records n-1.
-
Variance Population: measure of the variation within the data; this value is the variance of the whole population, calculated using the total number of data records = n.
Related topics: